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Abstract 

Optical pattern recognition allows objects to be recognized from their images and 
permits their positional parameters to be estimated accurately in real time. The 
guiding principle behind optical pattern recognition is that a lens focusing a beam 
of coherent light modulated with an image produces the two-dimensional Fourier 
transform of that image. When the resulting output is further transformed by the 
matched filter corresponding to the original image, one obtains the autocorrelation 
function of the original image, which has a peak at the origin. Such a device is called 
an optical correlator and may be used to recognize and locate the image for which it is 
designed. (From a practical perspective, an approximation to the matched filter must 
be used since the spatial light modulator (SLM) on which the filter is implemented 
usually does not allow one to independently control both the magnitude and phase of 
the filter.) Generally, one is not just concerned with recognizing a single image, but is 
instead interested in recognizing a variety of rotated and scaled views of a particular 
image. In order to recognize these different views using an optical correlator, one 
may select a subset of these views (whose elements are called training images) and 
then use a composite filter that is designed to produce a correlation peak for each 
training image. Presumably, these peaks should be sharp and easily distinguishable 
from the surrounding correlation plane values. In this report we consider two areas 
of research regarding composite optical correlators. First, we consider the question 
of how best to choose the training images that are used to design the composite 
filter. With regard to quantity, the number of training images should be large enough 
to adequately represent all possible views of the targeted object yet small enough to 
ensure that the resolution of the filter is not exhausted. As for the images themselves, 
they should be distinct enough to avoid numerical difficulties yet similar enough to 
avoid gaps in which certain views of the target will be unrecognized. One method 
that we introduce to study this problem is called probing and involves the creation of 
artificial imagery. The second problem we consider involves the classification of the 
composite filter’s correlation plane data. In particular, we would like to determine 
not only whether or not we are viewing a training image, but, in the former case, we 
would like to determine which training image is being viewed. This second problem 
is investigated using traditional M - ary hypothesis testing techniques. 
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Introduction & Background 

A Review of Quadratic Classifiers Consider a random vector X taking values 
in R d with a multivariate 1 Gaussian density function 

m= ^kjwr p ^ {x ~^' (x ~^ (1) 

where p = E[X] = [^]- =1 and E = E[(X - p)(X - p) T ] = K]^ =1 . Note that this 
distribution is completely characterized by d + | d(d + 1) parameters. We sometimes 
denote this distribution by N(//, £) where the covariance matrix E is a symmetric, 
nonnegative definite matrix. In writing expression (1), we have assumed that E is in 
fact ‘positive definite, and hence invertible. If, instead, E is singular then a T X = 0 a.s. 
for some nonzero vector a from R d . In such a case, we say that the distribution of X is 
degenerate. If E[|a T X| 2 ] > 0 for each nonzero a £ R d then the covariance matrix E is 
positive definite. The nonnegative square root of the value (x — /z) T £ _1 (x — p) in the 
exponent of (1) is called the Mahalanobis distance from x to the mean p. Note that 
a set of points with equal Mahalanobis distance to the mean forms a hyperellipsoid 
in R d . 2 

Our interest lies with the a posterior density function 

pjx\U 
p(x) 

where ti denotes an image with an a priori probability of P(t,). After taking the 
natural log of the a posterior density function and neglecting the terms that do not 
change with i, 3 we obtain a discriminant function of the form gi(x) = In p(x|f;) + 
In P{ti). Assume now that p(x\ti) is a multivariate Gaussian density function with 
mean vector pi and covariance matrix E, . In this case, it follows that 

9i(x) = - p t ) J T,~ 1 {x - pi) - ^ln(2?r)- iln|E,| + lnP(t,). (2) 

If E = a 2 1, then equation (2) reduces to 

9<(X) = ~ Kai ~ + l " f(ii) (3) 

x The components of X are said to be jointly or mutually Gaussian. Whereas a sum of Gaussian 
random variables need not be Gaussian, and uncorrelated Gaussian random variables need not be 
independent, these useful properties do hold when the random variables are jointly Gaussian. 

2 For more information concerning the topics in this section, see [3] or [4]. 

throughout this discussion the s are equal up to additive, constant functions of i and/or 
multiplicative, positive, constant functions of i. 
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where ||x — || 2 = (x — pi) r (x — /q). Note that if the a priori probabilities are equal 

then this test assigns x to the category that has the nearest mean, where distance 
is determined using the previous norm. This type of classifier is called a minimum 
distance classifier. However, if we expand <7, a step further (retaining all but the x T x 
term), equation (3) reduces to 

gfix) = ajx + & (4) 

where a, = m/t r 2 and /?, = —pJp,ij2o 2 -(-lnP(£ t ). A test of this form is called 
a correlator detector or linear detector. As indicated by equation (4), the decision 
boundaries induced by equation (3) are hyperplanes. 

Next, consider the case in which Si is equal to some constant matrix S for all i. 
In this case, equation (2) reduces to 

9>( x ) = - /q) T £ -1 (x - Pi) + In P(U). 

If the a priori probabilities are equal then this test assigns x to the distribution whose 
mean has the smallest Mahalanobis distance to x. Expanding further, however, we 
again obtain a test function of the form 

g,(x) = ajx + (3i 

where this time a* = S“ 1 /x l - and /?,■ = — + In P{U). Note that again we have 

obtained a linear classifier and that our decision boundaries are hyperplanes. 

In the general case, equation (2) may be written as 

gfix) = x r AiX + ajx + /?,■ 

where A{ = a, = £“Vi, and A = — |ln|E,| + lnP(< t ). Note 

that this test function is quadratic and that the corresponding decision boundaries 
are hyperquadrics. 

The Classification Problem iz Bayesian Inference Consider N 0 classes of ob- 
jects denoted by u> o, . . . , u>^ 0 . Assume that object class a >k contains Nk training images 
denoted by 7\fc, . . . ,T^ h k- Let Nj = J2k=-i Nk denote the total number of training 

images. The standard classification problem seeks a partition of the “signal space” 
into Nt classification regions denoted by Ri , . . . , Rn t . The composite classification 
problem , on the other hand, seeks a partition of the “signal space” into Nq regions 
denoted by Ci, . . . , Cjv 0 ; that is, each region corresponds to a different object class 
where each object class can contain many different training images. 

Let Nf denote the number of composite filters that are formed from the training 
images, and let Nm denote the number of “features” that are required by the chosen 
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correlation metric. Then, the dimension n of our signal space is given by NfNm ■ (In 
this context, a “signal” corresponds to the output of our composite filter. Our goal 
is to classify these outputs into their corresponding object classes. The goal of the 
standard classification problem would be to map each possible output to the training 
image that produced it. Note that, for the moment, we axe assuming that only 
training images are available as inputs to our system.) Our goals, once we develop 
our test, will be to choose the training image groups and composite filters so that (1) 
the distributions of the test statistics can be estimated accurately with as few samples 
as possible, and, (2) the distributions of the test statistics do not significantly overlap. 
(The first goal has pragmatic motivations, and the second goal reflects the standard 
desire that our test be as close to singular as possible.) 

Let X denote our signal; that is, let X be a random vector taking values in R NmNf 
that represents the Nm outputs of each of the Np filters. Let pr ik {x) be a probability 
density function for X when the training image Ti k is used as the input. Let 11,*, 
denote the a priori probability that the input to our system is training image T{ k . We 
will use these a priori probabilities to develop a Bayesian test. One assumption at 
this point is, of course, that these probabilities both exist and are known. Further, we 
will assume that the sum of the II j* ’s over all possible training images is unity. That 
is, as mentioned above, we will continue to assume that only training images are input 
into our system. Based upon these assumptions we have the following expression for 
a probability density function for the output of our system: 

N 0 N k 

Pt{x) = ikPT ik (x). 

k= 1 1=1 


Note that this density is only exact if our inputs are exclusively training images. In a 
more general setting in which our inputs need not be training images we would hope 
that this density would be a close approximation of the true density of the output. 
(An interesting problem would be to investigate the behavior of this approximation 
as Nt — *• oo.) 


The Standard Classifier For a standard classifier, we let N 0 — Nj) that is, 
our object classes are singleton sets each containing a single training image. In this 
case, we will denote T lk by T k and n,*, by life. According to the usual Bayes formula, 
we have 

PT*(r)IIfc 


p(T k \X = x) = 


pr(x) 


where p(T k \X = x) denotes the conditional probability that T k was input given that 
we observed x. Our Bayesian hypothesis test then is to assign x to T k if and only if 
p(T k \X = x) > p(Ti\X = x) for all i = 1, . . . , Nt- (That is, we choose the training 
image corresponding to a maximum a posterior distribution.) 
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The Composite Classifier For a composite classifier, No is less than (usually 
much less than) Nr . Let p Mi (x) be a probability density function for X when the 
input is a training image from the object class Let A* be the a priori probability 
that the input will be from class (Yet again, we are assuming that the input will 
always be a training image.) Note that 


N k 

a* = £eu- 


3 = 1 


Also, note that 


1 N k 

p* k ( x ) = 

Afc j = 1 


and further, using Bayes’ formula, note that 


/IV \ P“ k { x )hk 

p{uj k \x = x) = —~-r 

pr{x) 


1 Nk 
Pt{ x ) j=1 


(5) 


where p{oj k \X = x) is the conditional probability that the input is from class u k given 
that x is observed. Our test in this case assigns x to object class ui k if and only if 
p(u3 k \X — x) > p(ujj\X = x ) for all 7 = 1, ... , No. That is, we assign x to class u> k if 
and only if 

pW„\x = i) zl\PT,.(x)n it ' 1 
p(w i \X = I ) Z%,pr„( 1)11* “ 

for all j = 1 , . . . ,iV 0 . 

We will now assume that pr jk (x) is an n-variate Gaussian density function with 
mean vector rrijk and covariance matrix Cjk where we recall that n = Note 

that, in this case, pr(x) and Pu k {x) are mixtures of multivariate Gaussian densities, 
which, of course, can be far from Gaussian. Several problems present themselves at 
this point. First, the rather rueful distribution of X does not bode well for analytic 
solutions . 4 Second, the parameters m 3 k and Cjk are rarely known and hence often 
must be estimated. These estimates may then substituted into the test given above. 
Unfortunately, when this substitution is done, our test is generally no longer Bayesian, 
and hence, need no longer satisfy any desired optimality property such as minimum 
probability of error . 5 Although the test statistic is difficult to work with, one possible 

4 To be precise, it is only our approximation of the distribution of X that is a mixture of Gaussian 
distributions unless we assume that our inputs will only be training images. Of course, there is no 
reason to expect that the true distribution of X is any less rueful than our approximation of that 
distribution. 

5 The likelihood ratio corresponding to such a test is sometimes said to be a generalized likelihood 
ratio , particularly when the unknown parameters are replaced with maximum likelihood estimates. 
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simplification involves coordinate transformations that simplify the calculation of the 
Mahalanobis distances in the exponents. 

Training Images 

In the previous section, we assumed that the input to our system was always a 
training image, that the output of our system given that the input was a training 
image was Gaussian, and that the output of our system given that the input was a 
training image from a particular object class was equal in distribution to a mixture 
of Gaussian distributions. 

There are two types of non-training images. A first category non-training image is 
an unknown view of a known object. Ideally, this type of image will be close enough to 
an appropriate training image so that their distributions will have significant overlap. 
A second category non-training image is an image of an object that is not intended 
to be recognized. Ideally, this type of image should be far away from the training 
images so that its distribution will not have significant overlap with that of any 
training image. 

Of course, as we approach the ideal situation, we are generally going to need an 
ever increasing supply of training images. While a larger number of training images 
would be helpful when the input is a non-training image, it would also increase the 
complexity of our system and it would decrease the performance when the input 
actually is a training image. What we need is: (1) a method of determining how many 
training images we need to meet some desired goal, and (2) a method of obtaining 
appropriate additional training images when such images are required. In the next 
section, we will consider both of these problems. 

Probing In [2] the term probing is introduced and used to describe the creation of 
artificial images to improve and analyze pattern recognition algorithms. A determin- 
istic pattern recognition algorithm is simply a function mapping the set of all possible 
images to some decision set. Ideally, one would choose this function by considering 
each possible image in turn and determining for each the appropriate decision that 
should be made if that image appears as the input to our system. Unfortunately, how- 
ever, such a design procedure is generally not tractable due to the enormous number 
of possible input scenes. (For example, there are over lO 39,000 different 128 by 128 
pixel input images with 256 gray scales.) It is at this point that probing becomes 
useful since it allows one to intelligently sample this enormous image space in order 
to select images for which a pattern recognition scheme can be designed. 

We have identified three different methods of probing that appear promising with 
regard to optical information processing. First, probing can be used to form a “map” 
of the image space. As an example, consider two images /i and J 2 from the image 
space consisting of all N x N pixel images composed of M gray scale levels. (Note 
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that such an image can be viewed as a point in the set {0,1 M — l}^ 2 .) For each 
value of p £ [0, 1], let V p be a random object taking values in {0, 1, . . . , M — 1 }^ 2 such 
that each pixel value in V p is equal with probability p to the corresponding pixel value 
in I 2 and is equal with probability 1 — p to the corresponding pixel value in 7i. 6 (Note 
that Vo — I\ and Vj = 7 2 . Note also that if a particular pixel in I\ and I 2 agree then 
they also agree with the corresponding pixel in V p . Thus, the pixel variance in V p is 
only positive for pixels at which 7 a and I 2 disagree.) As p increases from 0 to l, 7 our 
decision algorithm (when applied to V p ) will on average no longer recognize I\ at some 
point pi and will begin instead to recognize I 2 at some point P 2 - These average values 
of p allow us to determine when images are “adjacent” and to recognize the possible 
existence of a “hole” between 7i and I 2 . A map of this sort allows us to distinguish 
between two procedures that perform the same when the inputs are always training 
images. Further, we could possibly use a realization of V p where p\ < p < P 2 to 
fill such a hole in our set of training images. Of course, different realizations of V p 
for some fixed choice of p £ (0, 1) could be quite different. (A similar procedure is 
described in [7] where sections of training images are randomly selected and weighted 
to form what is called a synthetic reference object.) 

Second, probing can be used to measure the robustness of a pattern recognition 
algorithm. For a training image T and a nonnegative value 6, let Ue be an image 
(i.e. a random object taking values in {0, 1, . . . , M — l}^ 2 ) such that each pixel value 
in U$ has a mean equal to the corresponding pixel value in T and has a variance 
equal to 0. Note that Uq — T a.s. The average value of 6 at which T is no longer 
recognized provides an indication of the robustness of our algorithm with respect to 
that particular training image. Presumably this value should be large and should 
not vary widely among the different training images. Tests of this sort allow us 
to distinguish between two procedures that perform identically when the inputs are 
always training images. 

Third, probing can be employed in which the pixel values in the synthesized image 
are not statistically independent. By introducing some sort of spatially localized 
dependence, we can analyze a situation in which a pixel in the synthesized image is 
more likely to be from a particular training image if its neighboring pixel values are 
also from that training image. 

6 In addition, one could further perturb the image by choosing for some given gray scale value 
Mo, a realization of a random variable with a unimodal distribution on {0, 1, . . ., M — 1} centered 
at Mo- 

7 Notice that from an information theoretic standpoint, the “uncertainty” is maximized when 
P = 1/2- 
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Composite Correlation 

Synthetic Discriminant Functions The synthetic discriminant function approach 
uses a linear combination of training images to create a composite image that is cross 
correlated with the input to the system. The weights in the linear combination are 
chosen so that the cross correlation at the origin is the same for all training images 
from the same class. (The resulting filter is sometimes called an Equal Correlation 
Peak (ECP) Synthetic Discriminant Function (SDF).) For example, if we have N 
training images si(x, y), . . . , $pj(x, y ), then our composite image would be of the form 

h{x, y ) = c*iSi(x, y) H b a N s N {x, y), 

and the c^’s would be chosen so that 

OC' OO 

J J h(x,y)si(x,y)dxdy = Ci 

— OO — OO 


for i = 1, . . . , N where the c,’s are preselected constants. Modifications of the stan- 
dard SDF approach exist that impose other constraints. For example, Minimum 
Variance SFD (MVSDF) minimizes the output noise variance and the Minimum Av- 
erage Correlation Energy (MACE) filter attempts to produce sharp correlation peaks 
at the origin of the output. 

Let H(u,v ) denote the Fourier transform of the composite image h(x,y ); that is, 
let 

OO OO 

H{u,v)= J J h{x,y)e~ j2x{xu+yv) dxdy. 


Note that 

OO OO 

h(x,y) = J J H(u,v)e 32 * {xu+yv) dudv. 


Also, let Si denote the Fourier transform of the zth training image s, for i = 1, . . . , N. 
Further, for an element x from K' V or O v let x denote the corresponding element in 
R NxN or C N * N that is obtained by placing x along the diagonal and 0 elsewhere. 
That is, let 


Xi if i — j 
0 if i ^ j. 


Finally, assume that the input to our system is corrupted with an additive, zero mean, 
wide sense stationary noise process N(x,y) with power spectral density P/v(u,u). 
That is, 


OO OO 

Pn(u,v)= J J E[N(x + T,y + \)N*(x,y)]e-W TU+Xv UTd\ 

— oo — OO 
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where the asterisk denotes the complex conjugate operation. We will now list several 
popular performance criteria. 

1. The Output Noise Variance 

ONV = H*(u, v) J P N {u, v)H(u , v) 

2. The Average Similarity Measure 

1-1 v 

ASM = H m (u,v) r -Y^(Si{u,v)-M s (u,v)) (S t (u,v) - M s (u, u)) H(u,v ) 

. «'=i 

1 N ~ 

where M s (u, v) = —^ Si(u, v) 

■‘ v »=i 

3. The Average Correlation Energy 

ACE = v) T [1 £ S*(u, v)Si(u , v)} H(u, v) 

. »=i 

4. The Average Correlation Height 

ACH = ^'EH(u,v) T S i (u,v) 

Topics for Further Research If the previous performance criteria were our 
only concern then our goal in choosing h would be to minimize ONV, ACE, and ASM, 
and to maximize ACH. Some immediate questions that arise are: 

1. Is it possible to optimize any or all of these parameters simultaneously? (In 
general, the answer is no for the performance criteria listed above. However, 
this question and those that follow should be considered whenever additional 
performance criteria are included.) 

2. If not, then can we fix one or more of the parameters and then optimize those 
that remain? Note that a similar procedure is used to obtain a Neyman-Pearson 
test. Since it is not generally possible to maximize the power and minimize the 
size of a test, a Neyman-Pearson test maximizes the power while keeping the 
size constant. (The proof of this result follows from standard techniques of 
variational calculus.) 
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3. There are numerous paradoxes that arise in the Neyman-Pearson theory. Do 
similar difficulties arise here? For example, in a Neyman-Pearson test, strange 
results can occur if the false alarm rate is chosen to be large in a test that is 
almost singular. Conversely, situations exist in which the power exceeds the 
false alarm rate by an arbitrarily small value. 

4. An optimal trade-off filter is obtained by fixing all but one of the parameters 
and optimizing the other. Is this necessary? Might it be possible to fix fewer 
parameters and then optimize those that remain? 

Minimum Euclidean Distance Optimal Filters Our goal in this section is to 
extend the results in [5] to include composite filters. In particular, we will first 
seek an algorithm by which the output of the MEDOF algorithm can be classified 
by statistical inference into training image classes. The following steps follow the 
procedure suggested in [1]: 

1. Separate the training images into object classes. The first step is to 
select the training images and object classes. Each object class should correspond to 
a different object that we wish to recognize. The training images within each object 
class should be chosen to adequately describe the different expected orientations of 
the object they represent. 

2. Create the filters by which these training images will be distin- 
guished. One way in which this step could be achieved would be to create a com- 
posite image from each object class based upon the training images in that object 
class. These composite images could then be used to create composite filters whose 
combined outputs would comprise the components of our output random vector X. 
(That is, we would let Nf — No.) 

3. Estimate the mean and covariance matrix for each class. We have 
assumed that the output of our system given that the input is a specific training image 
T lk will be multivariate Gaussian with mean mjk and covariance matrix Cjk ■ We 
will estimate these parameters via standard techniques. In particular, if we observe 
2i ,...,xn when training image Tjk is our input then we will estimate rrijk via 

_ 1 N 
m jk — "77 53 3 ' 1 ’ 
iV i=l 

and we will estimate Cjk via 

i N 

Cjk = Jy _ ^ X3 — m jk m jk) ■ 
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4. Calculate the generalized weighted Gaussian sum for each class. The 

Bayes test consists in selecting the object class for which equation (5) is largest. In 
our case, however, we must substitute the estimates we just obtained in place of the 
true parameters that are unknown. Thus, for each object class uJk from the set of No 
different object classes, we calculate the ratio 

Ejjji njfclC jfc r 1/2 exp (~|(g - m jk ) T Cj?(x - m jk )) 

Eil°i £& n it \C it \-^exp (-§(* - muYC-^x - m l7 )) ' 

5. Choose the class with the largest weighted sum. We select the object 
class for which the corresponding term found in expression (6) is the largest. Our 
test then announces that the input to our system belongs to this object class. 

Caveats 

1. The calculation of expression (6) is very computationally intensive. This prob- 
lem can be lessened somewhat by appropriate coordinate transformations. 

2. The insertion of the estimates in place of the true parameters may generally be 
expected to remove any optimality condition that the original test was designed 
to satisfy. 

3. The procedure is based upon an initial assumption of normality that may or 
may not be justified. In particular, an important concern is the robustness of 
our test with regard to perturbations in the underlying distributions. Also, 
what modifications would be required in order to remove the assumption of 
normality? 

4. The procedure requires one to possess the a priori probability associated witth 
each training image. These probabilities are generally not known and hence 
must be either estimated or assumed. Again, the robustness of our procedure 
with respect to these values is an important concern. 

Conclusion 

We have presented an overview of our research in two areas of optical pattern recog- 
nition. First, we have considered the use of probing to map the image space and 
to measure the robustness of an optical correlator with respect to deviations in the 
input from training images. Second, we have considered the use of Bayesian inference 
in the design of composite correlators in which images are assigned to object classes 
consisting of a collection of training images. 
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